Matrices, Vector Spaces, and Information Retrieval

نویسندگان

  • Michael W. Berry
  • Zlatko Drmac
  • Elizabeth R. Jessup
چکیده

The evolution of digital libraries and the Internet has dramatically transformed the processing, storage, and retrieval of information. Efforts to digitize text, images, video, and audio now consume a substantial portion of both academic and industrial activity. Even when there is no shortage of textual materials on a particular topic, procedures for indexing or extracting the knowledge or conceptual information contained in them can be lacking. Recently developed information retrieval technologies are based on the concept of a vector space. Data are modeled as a matrix, and a user’s query of the database is represented as a vector. Relevant documents in the database are then identified via simple vector operations. Orthogonal factorizations of the matrix provide mechanisms for handling uncertainty in the database itself. The purpose of this paper is to show how such fundamental mathematical concepts from linear algebra can be used to manage and index large text collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vectors, Planes and Context

Information Retrieval (IR) models based on vector spaces have been investigated for a long time. Nevertheless, they have recently attracted further research interest beyond the classical statistical view of vectors and matrices. Moreover, “context” has been recognized as a crucial component of IR systems. As the way context affects IR systems is very complex, a principled approach to modeling a...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Exploring the relationship between feature and perceptual visual spaces

visual information (images or videos) is increasing and thereby demanding appropriate ways to represent and search these information spaces. Their visualization often relies on reducing the dimensions of the information space to create a lower-dimensional feature space which, from the point-of-view of the end user, will be viewed and interpreted as a perceptual space. Critically for information...

متن کامل

s-Topological vector spaces

In this paper, we have dened and studied a generalized form of topological vector spaces called s-topological vector spaces. s-topological vector spaces are dened by using semi-open sets and semi-continuity in the sense of Levine. Along with other results, it is proved that every s-topological vector space is generalized homogeneous space. Every open subspace of an s-topological vector space is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • SIAM Review

دوره 41  شماره 

صفحات  -

تاریخ انتشار 1999